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Abstract 

We study the performance of Fictitious Play, when used as a heuristic for finding an approximate Nash 
«) I equilibrium of a 2-player game. We exhibit a class of 2-player games having payoffs in the range [0, 1] that show 

■ that Fictitious Play fails to find a solution having an additive approximation guarantee significantly better than 

I 1/2. Our construction shows that for x games, in the worst case both players may perpetually have mixed 

i strategies whose payoffs fall short of the best response by an additive quantity 1/2 — 0(l/n^^*) for arbitrarily 

small 6. We also show an essentially matching upper bound ofl/2 — 0(l/n). 



o 



1 Introduction 



E"' ■ Fictitious Play is a very simple iterative process for computing equilibria of games. A detailed motivation for it is 
O; given in Q. When it converges, it necessarily converges to a Nash equilibrium. For 2-player games, it is known to 
' converge for zero-sum games lITOl , or if one player has just 2 strategies ||2]- On the other hand, Shapley exhibited a 
1^1 ' 3x3 game for which it fails to converge llQlfTTI. 

, Fictitious Play (FP) works as follows. Suppose that each player has a number of actions, or pure strategies. 
^ Initially (at iteration 1) each player staits with a single action. Thereafter, at iteration t, each player has a sequence 
O of t — 1 actions which is extended with a t-th action chosen as follows. Each player makes a best response to a 
distribution consisting of the selection of an opponent's strategy uniformly at random from his sequence. (To make 
the process precise, a tie-breaking rule should also be specified; however, in the games constructed here, there will 
^ ■ be no ties.) Thus the process generates a sequence of mixed-strategy profiles (viewing the sequences as probability 
O • distributions), and the hope is that they converge to a limiting distribution, which would necessarily be a Nash 
^ ' equilibrium. 

. . The problem of computing approximate equihbria was motivated by the apparent intrinsic hardness of computing 
. ^ exact equilibria ||8l, even in the 2-player case O. An e-Nash equilibrium is one where each player's strategy has 
^ ] a payoff of at most e less than the best response. Formally, for 2 players with pure strategy sets M, N and payoff 
^ functions Uj : M x — )• M for i G {1,2}, the mixed strategy a is an e-best-response against the mixed strategy 
" " " r, if for any m G M, we have ui{a, r) > ui{m, r) — e. A pair of strategies a, r is an e-Nash equilibrium if they 
are e-best responses to each other Typically one assumes that the payoffs of a game are rescaled to lie in [0, 1], and 
then a general question is: for what values of e does some proposed algorithm guarantee to find e-Nash equilibiia? 
Previously, the focus has been on various algorithms that run in polynomial time. Our result for FP applies without 
any limit on the number of iterations; we show that a kind of cyclical behavior persists. 

A recent paper of Conitzer |5J shows that FP obtains an approximation guarantee of e = (t + l)/2t for 2-player 
games, where t is the number of FP iterations, and furthermore, if both players have access to infinitely many 
strategies, then FP cannot do better than this. The intuition behind this upper bound is that an action that appears 
most recently in a player's sequence has an e-value close to (at most l/t); generally a strategy that occurs a 
fraction 7 back in the sequence has an e-value of at most slightly more than 7 (it is a best response to slightly less 
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^Department of Computer Science, University of Liverpool, Ashton Street, Liverpool L69 3BX, U. K. 
^Department of Computer Science, University of Warwick, Gibbet Hill Rd, Coventry CV4 7AL, U. K. 
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than 1 — 7 of the opponent's distribution), and the e-value of a player's mixed strategy is at most the overall average, 
i.e., {t + l)/2t, which approaches 1/2 as t increases. 

However, as soon as the number of available pure strategies is exceeded by the number of iterations of FP, various 
pure strategies must get re-used, and this re-usage means, for example, that every previous occun^ence of the most 
recent action all have e-values of This appears to hold out the hope that FP may ultimately guarantee a signif- 
icantly better additive approximation. We show that unfortunately that is not what results in the worst case. Our 
hope is that this result may either guide the design of more "intelligent" dynamics having a better approximation 
performance, or alternatively generalize to a wider class of related algorithms, for example the ones discussed in ||6]. 

In Section |2]we give our main result, the lower bound of 1/2 — 0{l/n^^^) for any 6 > 0, and in Section [3]we 
give the corresponding upper bound ofl/2 — 0(l/n). 

2 Lower Bound 
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Figure 1 : The game belonging to the class of games used to prove the lower bound. 



We specify a class of games with parameter 7i, whose general idea is conveyed in Figure [T] which shows the row 
player's matrix for n = 5; the column player's matrix is its transpose. A blank entry indicates the value zero; let 
a = 1 + :^^Y^ and /3 = 1 — ^2(i-s) for <^ > 0. Both players start at strategy 1 (top left). Generally, let Qn be a 
4n X 4?i game in which the column player's payoff matrix is the transpose of the row player's payoff matrix R, 
which itself is specified as follows. For i,j G [An] we have 

• If i G [2 : n], = 1. If i G [n + 1 : An], Ri^i = 1. 

• If i E [n + 1 : An], Ri^i-i = a. Also, i?2n+i,4n = a- 

• Otherwise, if i > j and j < 2n, Rij = p. 

• Otherwise, if i > j and i — j < n, Rij = p. If j G [3n + 1 : An], i G [2n + 1 : j — n], = p. 

• Otherwise, Rij = 0. 
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For ease of presentation we analyze FP on the obtained results can be seen to apply to a version of Gn with 
payoffs rescaled into [0, 1] (cf. the proof of Theorem ID . 

2.1 Overview 

In this section we give a general ovei^view and intuition on how our main result works, before embarking on the 
technical details. Number the strategies 1 , . . . 4n from top to bottom and left to right, and assume that both players 
start at strategy 1. Fictitious Play proceeds in a sequence of steps, which we index by positive integer t, so that 
step t consists of both players adding the t-th element to their sequences of length t — 1. We have the following 
observation: 

Observation 1 Since the column player's payoff matrix is the transpose of the row player's, at every step both 
players play the same action. 

This simplifies the analysis since it means we are analyzing a single sequence of numbers (the shared indices of the 
actions chosen by the players). 

A basic insight into the behavior of Fictitious Play on the games in question is provided by LemmalU which tells 
us a great deal about the structure of the players' sequence. Let st be the action played at step t. We set si = 1. 

Lemma 1 For any time step t, if st 7^ st+i then st+i = st + 'i (or st+\ = 2n + 1 if st = 4nj. 

Proof. The first n steps are similar to O. For step t > n, suppose the players play st 7^ 4n (by Observation 
[H the two players play the same strategy), st is a best response at step t, and since Rgf+i^st > Rst,st > ^jst 
(j {st, St + 1}), strategy st + 1 is the only other candidate to become a better response after st is played. Thus, if 
st+i / St, then st+i = st + I. Similar arguments apply to the case st = ^n. □ 

The lemma implies that the sequence consists of a block of consecutive I's followed by some consecutive 2's, 
and so on thi^ough all the actions in ascending order until we get to a block of consecutive 4n's. The blocks of 
consecutive actions then cycle thi^ough the actions {277, + 1, ... , An} in order, and continue to do so repeatedly. 

As it stands, the lemma makes no promise about the lengths of these blocks, and indeed it does not itself rule 
out the possibility that one of these blocks is infinitely long (which would end the cycling process described above 
and cause FP to converge to a pure Nash equilibrium). The subsequent results say more about the lengths of the 
blocks. They show that in fact the process never converges (it cycles infinitely often) and furthermore, the lengths 
of the blocks increase in geometric progression. The parameters a and /3 in Qn govern the ratio between the lengths 
of consecutive blocks. We choose a ratio large enough that ensures that the n strategies most recently played, 
occupy all but an exponentially-small fraction of the sequence. At the same time the ratio is small enough that the 
con^esponding probability distribution does not allocate much probability to any individual strategy. 

As an aside, we conclude with the following observation, which is not hard to check from the structure of the 
game. 

Observation 2 The game has a mixed Nash equilibrium in which both players use the uniform distribution over 
strategies {2n + 1, . . . , 477}. The equilibrium has payoff approximately ^ to each player There are no pure Nash 
equilibria, although if both players use the same strategy in the range {77 + 1, . . . ,477} then they would receive 
payoff 1. Recall that a > 1, so a payoff of 1 to each player does not imply an equilibrium. 

2.2 The proof 

We now identify some properties of probabilities assigned to strategies by FP. We let lt{i) be the number of times 
that strategy i is played by the players until time step t of FP. Let pt{i) be the corresponding probability assigned 
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by the players to strategy i at step t, also for any subset of actions S we use pt{S) to denote the total probability of 
elements of S. So it is immediate to observe that 



The next fact easily follows from the FP rule. 

Lemma 2 For all strategies i < n, pt{i) = \ and therefore £t{i) = ^far any time step t > i. 

Proof. At step 1, each player sets pi{l) = 1 and pi{i) = for i > 1. As in IH, for t < n the sequence chosen 
by both players is (1,2,..., t), so pt{i) = i for i < t and otherwise. Lemma [T] implies that none of the first n 
strategies will be a best response subsequently, thus implying the claim. □ 

By Lemma [TJ each strategy is played a number of consecutive times, in order, until the strategy An is played; at 
this point, this same pattern repeats but only for the strategies in {2n + 1, . . . , An}. We let t* be the length of the 
longest sequence containing all the strategies in ascending order, that is, t* is the last step of the first consecutive 
block of 4n's. We also let ti be the last time step in which i is played during the first steps, i.e., ti is such that 

= + 1 and £t{{) = it* (i) for t G {t„ t*}. 

Lemma 3 For all strategies n + 1 < i < 3n and all t G {ti, . . . , t*}, it holds: 

a — (3 1 a — /3 

7-Pt{i - 1) < pM < - H 7-Pt{i - 1) 

a — 1 t a — 1 

and therefore, 

^^it{i - 1) < itii) < 1 + ^^it{i - 1). 
a — 1 a — 1 

Proof. By definition of ti, strategy i is played at step ti. This means that i is a best response for the players given 
the probability distributions at step ti — 1. In particular-, the expected payoff of i is better than the expected payoff 
of i + 1, that is, 

i-2 

P^PU-lU) + aPU-lii - 1) +Pk-l{^ > 
i=i 

i-2 

f^^PU-iU) + I^Pk-iii - 1) +apf,-i(«). 
i=i 

Since a > 1, the above implies thatpj._i(z) < ^EyPti-iC^ ~ !)■ By explicitly writing the probabilities, we get 

ti — 1 ~ a — 1 ti — 1 

^*.«-l< ^4(^-1) ^ (1) 
a — I 

Ml < 1 + ^ 

ti t{ (y, 1 ti 

Pu{r)<^ + ^PtAi-i)- (2) 

ti a — I 
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At step tj + 1 strategy i is not a best response to the opponent's strategy. Then, by Lemma[T] i + 1 is the unique best 
response and so the expected payoff of « + 1 is better than the expected payoff of i given the probability distributions 
at step ti, that is, 

i-2 

i-2 



Since a > 1, the above implies that 



Puii) > ^Puii - II (3) 
a — I 



and then that 

4«> ^4(^-1)- (4) 
a — I 

By definition of U action i will not be played anymore until time step t*. Similarly, Lemma [T] shows that i — I will 
not be a best response twice in the time interval [1, t*] and so will not be played until step t*. Therefore, the claim 
follows from ©, @, ® and ©. □ 

Lemma 4 For all strategies i £ {3n + 1, . . . , 4n — 1} and all t £ {ti, . . . , t*}, it holds: 

a — /3 ,. ^, ^ /-N ^ 1 a — (3 ,. ^, /3 /. n 

7-Pt{i - 1) < Pt{i) < - H 7-Pt[i - 1) H TPt[i - n) 

a — 1 t a — 1 a — 1 

and therefore, 

^^it{i - 1) < m < 1 + ^^it{i - 1) + -^it{i - n). 
a — I a — I a — I 

Proof. By definition of ti, strategy i is played at time step ti. This means that i is a best response for the players 
after ti — 1 steps. In particular, the expected payoff of i is better than the expected payoff of i + 1, that is, 

(2n i-2 \ 

J^Pt.-l(j) + PU-lU) + (^PU-l{i - 1) + 

j=l j=i-n J 

(2n i-2 \ 

Ypu-iU)+ Y1 Pu~iij)] + 
j=l j=i-n+l J 

(3pt^-i{i - 1) + apt^-i{i). 

Since a > 1, the above implies that pt--i{i) < ~ 1) + "^^Pu-iii — Similarly to the proof of 

Lemma [3] above this can be shown to imply 

1 a — /3 /3 

PtAi) < - + TPtA^ - 1) + 7PtA^ - n), (5) 

ti a — L a — I 

m < 1 + - 1) + - n). (6) 

a — L a — L 
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At time step ti + 1 strategy i is not a best response to the opponent's strategy. By Lemma [T] i + 1 is the unique 
best response and so the expected payoff of i + 1 is better than the expected payoff of i, thus implying that 

Puii) > TPuii - 1) + —^Pui^ - 

a — 1 



(J. 


- B 


a 


— 1 


a 


-/3 


a 


- 1 


a 


-/3 


a 


- 1 


a 


-/3 


a 


- 1 



iuii) > - 1) + -^^Uii - n) 

a — i 

> ^4(^-1)- (8) 
a — I 

Similarly to Lemma |3l the claim follows from dS), (ID, ^ and ([8]l, the definition of ti and the fact that, by Lemma 
[U a strategy belonging to {3n + 1, . . . , 4n — 1} is never twice in time a best response in the time interval [1, t*]. □ 

The next lemma shows that we can "forget" about the first 2n actions at the cost of paying an exponentially small 
addend in the payoff function. 

Lemma 5 For any 5 > {), a = I + ^ and ^ = I - Y.f=i Pt* (j) < 2""'. 

Proof. We first rewrite and upper bound the sum of the probabilities we are interested in: 



2n 2n 



E-li^AJ) 



1 1 

< 



Note that by Lemmata |2l[3] and |4] we have that 

it* (4n - 1) > ^^£t* (4n - 2) > (^^) ' It* (4n - 3) 
a — 1 \a — 1 J 

3n— 1 / n\ 3n— 1 



By plugging in the values of a and /5 given in the hypothesis we have that 

((1 + ^) ^ ^ 2 ^ 



- ' 

where the penultimate inequality follows from the observation that the function (1 + > 2 for x > 2. □ 

The theorem below generalizes the above arguments to the cycles that FP visits in the last block of the game, i.e., 
the block which comprises strategies S* = {2n + 1, . . . , An}. Since we focus on this part of the game, to ease the 
presentation, our notation uses circular arithmetic on the elements of S. For example, the action j + 2 will denote 
action 2n + 2 for j = An and the action j — n will be the strategy 3n + 1 for j = 2n + 1. Note that under this 
notation j — 2n = j + 2n = j for each action j in the block. 
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Theorem 1 For any 5 > 0, a = 1 + ^^^4^ and j3 = 1 — ^2(^-s) ' ''^ sufficiently large, any t >t* we have 

— -^^^^ > 1 -\ — jfor alii ^ S with i ^ st, St + 1, 

Pt{i — 1) ° 

f , < 1 + -^/or all i G 5. 

Pt{i — 1) " 

Proof. The proof is by induction on t. 

Base. For the base of the induction, consider t = t* and note that at that point st* = and Sf* + 1 = 2?i + 1. 
Therefore we need to show the lower bound for any strategy i G {2n + 2, . . . , 4n — 1}. From Lemmata |3] and |4] we 
note that for i 7^ 4n, 2n + 1, 

Pt* ji) ^ a- 13 ^ ^ 1 



+ 



Pt* {i — 1) a — 1 ' ^ 
As for the upper bound, we first consider the case of i / 4n, 2n + 1. Lemma |3] implies that for i = 2n + 2, . . . , 3n, 

Pt*{'i') < i_ _^ " - /3 



Pt* — 1) t* a — 1 ' 
while Lemma |4] implies that for ? = 3n + 1, . . . , 4n — 1, 

Pt* («) < i_ _^ « - _^ P Pt* {i - n) 



Pt*{i — 1) t* a — 1 a — 1 pt*{i — I) 
_ 1 a-f3 /3 it*{i-n) 
~ ¥^ a - 1 ^ a - 1 it*{i - 1) ■ 

To give a unique upper bound for both cases, we only focus on the above (weaker) upper bound and next are going 
to focus on the ratio . We use Lemmata [3] and |4] and get 

it* {i-l)> ^^it* {i-2)> (^^) ' et* {i - 3) 
a — 1 \a — 1 J 

By setting a and /3 as in the hypothesis and noticing that t* > n> n^~^ we then obtain that 

pt*{i) 2 / 1 y-"n2{i-^)-l 



Pt*[i — l) n'^ ° \ n'- ° J n 



We end this part of the proof by showing that the last addend on the right-hand side of the above expression is upper 
bounded by ^^^-t, ■ To do so we need to prove 



which is equivalent to 



I + >4(n^(-^)-l). 
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We now lower bound the left-hand side of the latter inequality: 



1 \"'~*A"'*^^ 2"^ 2"'' 



where the first inequality follows from the fact that the function (1 + is greater than 2 for x > 2 and the second 

one follows from the fact that 2^ < 2 for n^~^ > 1. Then, since for n > '''""A/i, 5n2(i"'^) > 4(n2(i-'^) - 1), 
to prove ^ is enough to show 



2"* > 2(5n2(i-^)) ^ > 2(1 - 5)log2(10n). 

To prove the latter, since 5 > 0, it is enough to observe that the function is certainly bigger than the function 
21og2(10n) > 2(1 — 6) log2(10n) for n large enough (e.g., for 5 = 1/2, this is true for n > 639). 
The following claim concludes the proof of the base of the induction. 

Claim 1 The upper bound holds at time step t* for i = 4n, 2n + 1. 

Proof. We first show the claim for i = An. At time step t* FP prescribes to play An. This in particular means that 
the strategy An achieves a payoff which is at least as much as that of action 2n + 1 after t* — 1 time steps. We write 
down the inequality given by this fact focusing only on the last 2n strategies (we will consider the first strategies 
below) and obtain: 

Pt*-i(4n) + apt*-i{An - 1) + /3pt*_i(3n) > 

apt*-i{An) + pt*-i{2n + I) + ^Pt*-i{An - I) (10) 



and then since a > 1 



Pt*-i{An) ^ a- P ^ /3 pt*_i(3n) 



Pt*-i{An-l) a -I a - I pt* -i{An - I) 

1 pt*-i(2n + l) 

a - lpt*-i{An - 1)' 

Similarly to the proof of Lemma|3]above this can be shown to imply 

Pt*{An) ^ 1 ^ a- 13 ^ /? Pt*(3n) 



Pt* {An - I) t* a-1 a - I pt* {An - 1) 

1 pt*{2n + l) 

a — Ipt* {An — 1) 
^ 1 « - . P Pt* (3n) 

-t* a-1 a-lpt*{An-l)' ^ ^ 

We now upper bound the ratio ^—y p^*(*4^"\) ■ By repeatedly using Lemmata [3] and |4] we have that 

Pt*(4n - 1) > ^^pi*(4n - 2) > (^^)\t.{An - 3) 
a — 1 \a — 1 J 

This yields 

/3 pt^^n) ^ (3 /g-iy-^ 1 



a — Ipt* {An — 1) a — 1 \a — (3 J An 



,1-5' 



8 



where the last inequaUty is proved above (see Q). Therefore, since t > ^, (fTTl) implies 

^ ^ < 1 + ^ + 



Pt*(4n - 1) n^-^ An^-^' 

To conclude this part of the proof we must now consider the contribution to ([TOl i of the actions 1 , . . . , 2?i that are 
not in the last block. However, Lemma [5] shows that all those actions are played with probability 1/2" at time t*. 
Thus the overall contribution of these strategies is upper bounded by — /3) < Similarly to the above, 

we observe that, for n sufficiently large, > log2(4n) > (1 — (5) log2(4ra) which impUes that < ^^l-t, ■ This 
concludes the proof of the upper bound at time for i = An. 

Consider now the case i = 2n + 1. At time step + 1, 4n is not played by FP, which means that 4n is not a best 
response after t* time steps. By Lemma[T] the best response is 2n + 1; then, in particular, the payoff of 2n + 1 is 
not smaller than the payoff of 4n at that time. We write down the inequality given by this fact focusing only on the 
last 2n strategies (we will consider the first strategies below) and obtain 

pt* (4n) + apt* (4n - 1) + (3n) < 
apt* (4n) + pt* {2n + 1) + /3pt* (4n - 1) 

and then since a > 1 

Pt*{An) ^ a- /3 ^ (5 Pt*{3n) 



Pt* {An — 1) a — 1 a — 1 Pt* {An — 1) 

1 pt*{2n + l) 

a — 1 pt*{An — 1) 



(12) 



We next show that > or equivalently that > i - i^^XSn^iy P^^^^ 

this it is enough to show that \ > 4. We observe that 

Pt*{3n) Pt*{3ri) pt*{3n-l) pt*{2n + 2) 



pt*{2n + l) pt*{2>n-l)pt*{2,n-2) pt*{2n + l) 



> , ^^)"" > i 



a-l) - /3' 

where the first inequality follows from Lemma|3]and the second inequality follows from the observation (similar to 
the above) that for n sufficiently large n^ > 2 log2(2n). Then to summarize, for a and /? as in the hypothesis, ([T2b 
implies that 

Pt*{^n) 1 1 



Pt*{An - 1) n^-^ An^~^' 

As above we consider actions 1, . . . , 2n and observe that their contribution to the payoffs is upper bounded by 
^^\-s ■ Now to conclude the proof of the claim for the case i = 2n + 1 we simply notice that the above implies 
Pt*{An — 1) < pt*{An) and Lemmata |3] and |4] imply that pt*{2n + 1) < pt*{An — 1) which together prove the 
claim. □ 

Inductive step. Now we assume the claim is true until time step t — 1 and we show it for time step t. By inductive 
hypothesis, the following is true, with j ^ st-i, st-i + 1 

l + ^jS^T^^Sl + ^S. (13) 

n^ ^ Pt-i{] - 1) ^ 

Pt-i{st-i) 3 



pt-i{st-i - 1) n 

"^"^W^ <l + 47- (14) 
pt-i{st-i) n^ ^ 
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We first consider the case in which st 7^ st-i- By Lemma [T] the strategy played at time t is st^i + 1, i.e., 
St = st-i + l. Let st^i = i and then we have = i + By inductive hypothesis, for all the actions j / i,2+l,i+2 
we have 

^ = 1 + ^<4^<1 + 4t- (15) 
a — 1 ° pt[j — 1) ° 

Indeed, for these actions j, £t-i{j) = ^t(j) and £t~i{j — 1) = — 1). Therefore the probabilities of j and 
j — 1 at time t are simply those at time t — 1 rescaled by the same amount and the claim follows from ([T3] ). The 
upper bound on the ratio ^^|^]|]^| easily follows from the upper bound in ([T3T l as £t-i{i + 2) = it{i + 2) and 
lt-i{i + 1) < ^tii + 1) = ^t-i{i + 1) + 1. However, as = i + 1 here we need to prove lower and upper bound 
also for the ratio ^!'^^\ \ and the upper bound for the ratio l^iii^+li. 

Claim 2 1 + ^ < < 1 + 

Proof. To prove the claim we first focus on the last block of the game, i.e., the block in which players have strategies 
in {2n + 1, . . . , An}. Recall that our notation uses circular arithmetic on the number of actions of the block. 
The fact that action i + 1 is better than action i after t — 1 time steps implies that 



and then since a > 1 



Pt-iii) + apt^i{i - 1) + Ppt-i{i -n) < 
apt-i{i) + (i + 1) + /3pt-i{i - 1) 



Pt-i{i) ^ a- P ^ /3 pt-i{i-n) 



Pt-i{i - 1) ~ a - I a-lpt-i{i-l) 

1 Pf-i(i-2n + l) 

a-1 pt-iii-1) 



(16) 



We next show that /^P^-i('-»)-P^-i.(^-2n+i) > _ 1 ^^^^^^^^^^^^^ j^at , "'^^^X^ > - " ^""'^^^'^^'"'^ 



To prove this it is enough to show that ^' "^,s > 4. We observe that 



(Q-l)pt-i(i-l) - 4^?^ "^"''^"'"""•^ pt-i(i-2n+l) - /3 4/3ni-*pf_i(i-2n+l)' 

Pf-i(i-n) ^ j_ 
pt-i{i-2n+l) - 13- 

Pt-i{i - n) Pt~i{i - n) Pt~i{i - 2n + 2) 



Pt-.i{i - 2n + I) pt-i{i-n-l) pt_i(i-2n + l) 



a-lj - /3 

where the first inequality follows from inductive hypothesis (we can use the inductive hypothesis as all the actions 
involved above are different from i and and the second inequality follows from the aforementioned observation 
that, for sufficiently large n, > 21og2(2n). Then to summarize, for a and p as in the hypothesis, (fT6l ) implies 
that 

Pt(.i) _ Pt-i{i) >i_^ 1 1 



Pt{i-1) pt-i{i-l)~ n^-^ 4ni-'5' 

where the first equality follows from £t-i (i) = £t{i) and It-i (i — 1) = £t{i — 1), which are true because = i + 1. 
Since action i + 1 is worse than strategy i at time step t — 1 we have that 

Pt-i{i) + apt^i{i - 1) + Ppt~i{i -n)> 
apt-i{i) + pt-i{i + I) + Ppt-i{i - 1) 
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and then since a > 1 



Pt-i{i - I) ~ a-1 a-1 pt_i(i - 1) 

1 Pf„i(i-2n + l) 

a-1 pt_i(i-l) 

Similarly to the proof of Lemma |3] above this can be shown to imply 

Ptii) , pt{i-n) 



pt{i — 1) t a — 1 a — 1 pt{i — 1) 
_ 1 pt{i-2n + l) 
a-1 pt{i - 1) 

<- + 7+ , )■ IV (17) 

I a — 1 a — 1 pt\t — Ij 

We now upper bound the ratio By repeatedly using the inductive hypothesis ([TSl l we have that 



2 



Pt(i - 1) >^Pt(i - 2) > ( ^ ) - 3) 
a — 1 \a — 1/ 

-(^) 

(Note again that we can use the inductive hypothesis as none of the actions above is i or i + 1.) This yields 

(3 ptji-n) P /g-iy^i^ 1 



a-lpt{i-l) a-l\a-l3) 4ni^'^' 
where the last inequality is proved above (see Therefore, since t > n^^^, (fTTl) implies the following 



pt{i - 1) ~ 4n^~^' 

To conclude the proof we must now consider the contribution to the payoffs of the actions 1, . . . , 2n that are 
not in the last block. However, Lemma [5] shows that all those actions are played with probability 1/2" at time 
t*. Since we prove above (see Lemma [D that these actions are not played anymore after time step t* this implies 
that Yl'j=iPtij) ^ Yl'j=iPt*U) ^ 2~" . Thus the overall contribution of these strategies is upper bounded by 
^(a — /3) < ^ < ^^l-i where the last bound follows from the aforementioned fact that, for n sufficiently large, 
> {1 — 6) log2(4n). This concludes the proof of this claim. □ 



Claim 3 ^^i^ < 1 + 



Proof. From ([T6]l (and subsequent arguments) we get > (z — 1) and from ([TSll we get (z — 1) > pt{i — 2) > 
. . . > pt{i — 2n + 1) = pt{i + 1). Therefore, Pt{i) > Ptii + 1) thus proving the upper bound. □ 

Finally, we consider the case in which st-i = Sf . In this case, for the actions j / , st + 1 it holds £t~i{j) = (j) 
and £t-i{j — 1) = ^t(j — 1)- Therefore, similarly to the above, for these actions j the claim follows from (fT3] ). The 
upper bound for the ratio ^^p^(^^^-^^ easily follows from (fT4l) as it-i{st + 1) = it{st + 1) and £t-i{st) < itist) = 

^i-i(st) + 1. The remaining case to analyze is the upper bound on the ratio ^jf^rry- To prove this we can use 
mutatis mutandis the proof of the upper bound contained in Claim|2]with st = i. □ 
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The claimed performance of Fictitious Play, in terms of the approximation to the best response that it computes, 
follows directly from this theorem. 

Theorem 2 For any value of 5 > and any time step t, Fictitious Play returns an e-NE with e > i — O ( iL^ ). 



Proof. For t < n the result foUows since the game is similar to ||5]. In details, for i < n the payoff associated to 
the best response, which in this case is + 1, is upper bounded by 1. On the other hand, the payoff associated to 
the cuixent strategy prescribed by FP is lower bounded by ^ X^j=oi where i = st- Therefore, the regret of either 

player normalized to the [0, 1] interval satisfies: e > ^ - Since ^ < 1/2, the fact that 1 - | - f + > 

(which is true given the values of a and (3) yields the claim. For t < t* the result follows from Lemmata |3] and 
m while the current strategy st (for t < t*) has payoff approximately 1, the players' mixed strategies have nearly 
all their probability on the recently played strategies, but with no pure strategy having very high probability, so that 
some player is likely to receive zero payoff; by symmetry each player has payoff approximately |. This is made 
precise below, where it is applied in more detail to the case of t > t*. 

We now focus on the case t > t*. Recall that for a set of strategies S, pt{S) = X]jgsPt(^)- Let St be the 
set {2n + 1, . . . , Sf} U {sf + n, . . . , An} if sj < 3n, or the set {sj — n, st} in the case that sj > 3n. Let 
S't = {2n + 1, . . . , 4n} \ St- Also, let sf"^'' = arg TaaXi^^2n+i,...,An}{Pt{i))', note that by TheoremUl s™'^'' is equal 
to either sj or , where = st — lit st > 2n, or 4n if st = 2n. 

We start by establishing the following claim: 

Claim 4 For sufficiently large n, pt{St) > 1 — ^^r-- 

Proof. To see this, note that for all x G S'^, by pt{sf^''^^) > pt{sf^'^^ — 1) and Theorem [T]we have 

Pt(gf^^^) _ p^(gmax-) p^(gmax _ -^^ p^(x + 1) 

Pt{x) ~ pt(4^^^-l)pt(sr^-2) pt{x) 

1 \ 

> (1 + 



where k is the number of factors on the right-hand side of the equality above, i.e., the number of strategies between 
X and sf^^^. Thus, as A; > 7i, 



l-k 



< (^i+_L_y""<4(i-")/("^-'). 

Hence pt{S[) < (2n)4(i-")/("'"') = ^^^^^^ < where the last inequality follows from the fact that, for 

large n, 4^/"^ * < 2. Then pt{St) > 1 — Pt{S't) — • • • , 2n}), which establishes the claim, since Lemma|5] 

establishes a strong enough upper bound on p(({l, . . . , 2n}). □ 



Claim 5 st, the current best response at time t, has payoff at least /? ( 1 



2n-l 

Proof. St receive a payoff of at least (3 when the opponent plays any strategy from St, the claim follows using Claim 

HI □ 

Let Et denote the expected payoff to either player that would result if they both select a strategy from the mixed 
distribution that allocates to each strategy x, the probability pt{x). The result will follow from the following claim: 

Claim 6 For sufficiently large n, Et < ^ + -^^^ + 
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Proof. The contribution to Et from strategies in {1, ... , n}, together with strategies in S[, may be upper-bounded 
by a times the probability that any of that strategies get played. This probability is by Lemma |5] and Claim |4] 
exponentially small, namely 2n/2" . 

Suppose instead that both players play from St- If they play different strategies, their total payoff will be at most a, 
since one player receives payoff 0. If they play the same strategy, they both receive payoff 1. We continue by upper- 
bounding the probability that they both play the same strategy. This is upper-bounded by the largest probability 
assigned to any single strategy, namely pf(s™^^). 

Suppose for contradiction that pt{sf^^) > 6/n^~^. At this point, note that by TheoremlH for any strategy s G St, 
we have 



pt{s) ' ni 

where k is the distance between s and s™*^. Therefore, denoting r = (l + j^r^) ^ , we obtain 

st-l 

Pt{St) = X] Pt(^) = Pt(^t) + X] P*'^^ 

s&St i=st-n 
n-1 

>Pt{srnT.^'- 

k=0 

Applying the standard formula for the partial sum of a geometric series we have 

Pt{St) > 



n 



1-5 I 1 



Noting that 1 — r" > ^ we have pt{St) > ^tttt • (^) • (^^-3—) which is greater than 1, a contradiction. 
The expected payoff Et to either player, is, by symmetry, half the expected total payoff, so we have Et < {1 



2 



" + ^ + which yields the claim. □ 



We now show that Fictitious Play never achieves an e- value better than \ — O {i^^hr) ■ From the last two claims the 
regret of either player normahzed to [0, 1] is 

B f 2n- 1\ 1 6 2n 



rv"-^ + 1 \ f 2n-l 



1 6 2n 



2 n^~^ + 1 2"^ 
1 7i^~^ + 1 n^-^ + 1 2n - 1 



2 n2(i-'5) + ni-^ n2{i-5) + n^- 



n 

1 

"2 







6 


2n 


i--^ + 1 


217.1 






I 

V 





■5 on* 



This concludes the proof. □ 
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3 Upper bound 



In this section, n denotes the number of pure strategies of both players. Let a, b denote the FP sequences of pure 
strategies of length t, for players 1 and 2 respectively. Let ajy^..^] denote the subsequence a^, . . . , o^. We overload 
notation and use a to also denote the mixed strategy that is uniform on the coixesponding sequence. 

Let m* be a best response against b, and let e denote the smallest e for which a is an e-best-response against b. To 
derive a bound on e, we use the most recent occurence of pure strategy in a. For G {1, . . . , t}, let f{k) denote the 
last occurrence of in the sequence a, that is, 

f{k) := max i. 

ee{l,...,t}, ai=ak 

We have the following. 



e = ui{m* , b) — til (a, b) 
1 * 

= - ^ (ui(m*, b) - ui{ai, b)) 



1=1 
t 



1 ^ f{i)^ 1 ^^^ ^ ^[l:/(i)~l] ) - Ul {ai , )) + ''^* ' ^[^(*)^*] ~ ' 

2=1 



< -^X] [ J (^^l("^*>^[/W:t]) -^^l(ai,fo[/(i):f]))] (18) 

i=l 

1=1 

= l + ^-^E/« (20) 

i=l 

Inequality ( fTSl) holds since is a best response against by definition. Inequality ( fT9l ) holds since payoffs 

are in the range [0, 1]. To provide a guarantee on the performance of FP, we find the sequence a that maximizes the 
RHS of dini), i.e., that minimizes YlUi /(«)• 

Definition 1 For a FP sequence a, let S{a) := X^*^]^ /(oi) ^'f^ o = ^-rg miria ^(a). 

The following three lemmata allow us to characterize a, the sequence that minimizes S{a). 
Lemma 6 The entries of a take on exactly n distinct values. 

Proof. The entries of an FP sequence can take on at most n distinct values. Suppose for the sake of contradiction 
that the entries of a take on strictly less than n distinct values. Then there is a pure strategy, say m, that does 
not appear in d and a pure strategy m' that appears more than t/n times. Obtain a from a by replacing a single 
occurrence of m' in d with m. Then S{a) < S{a), a contradiction. □ 

We now define a transformation of an FP sequence a into a new sequence a' so that S{a') < S{a) if a / a'. 

Definition 2 Suppose the entries of a take on d distinct values. We define xi, . . . ,Xd to be the last occurrences, 
{f{ai) I i € [t]}, in ascending order Formally, let Xd := at and for k < diet x^ := Ui be such that 

i := arg max aj ^ {x^+i, . . . , x^}. 
3=1, ...,t 
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For i = 1, . . . ,d, let 

#{xi) := \{aj I j G [t],aj = Xi}\, 
which is the number of occurrences ofxi in a. Define a' as 



a := xi, . . . ,xi,X2,. ■ ■ ,X2, - ■ ■ ,Xd, . . . ,Xd. 



#(X2) 

Lemma 7 For any FP sequence a, let a' be as in Definition^ If a' 7^ a then S{a') < S{a). 

Proof. For all i = 1, ... ,t we have /(a^ ^ f{0'i)^ ^rid since a' ^ a there is at least one i such that /(a') < 
f{ai). □ 

Lemma 8 Let n,t € N be such that n\t. Let a be a sequence of length t of the form 

• • ,n, n. 




Then S{a) is minimized if and only ifci = ■ ■ ■ = Cn = t/n. 

Proof. We refer to the maximal length subsequence of entries with value u G {1, ... n} as block u. Consider two 
adjacent blocks u and u + l, where block u starts at i and block u + 1 starts at j and finishes at k. The contribution 
of these two blocks to S{a) is 

i-i k 

E(-?'-i) + E^ = -?''-(^+^)^' + (^ + ^) • 

If A; + i is even, this contribution is minimized when j = ^i^. If fc + i is odd, this contribution is minimized for 
both values j = [^J and j = [^]. 

Now suppose for the sake of contradiction that S{a) is minimized when ci = ■ ■ ■ = Cn = t/ii does not hold. 
There are two possibilities. Either there are two adjacent blocks whose lengths differ by more than one, in which 
case we immediately have a contradiction. If not, then it must be the case that all pairs of adjacent blocks differ in 
length by at most one. In particular, there must be a block of length t/n + 1 and another of length t/n — 1 with all 
blocks in between of length t/n. Flipping the leftmost of these blocks with its right neighbor will not change the 
sum S{a). Repeatedly doing this until the blocks of lengths t/n + 1 and t/n — 1 are adjacent, does not change S{a). 
Then we have two adjacent blocks that differ in length by more than one, which contradicts the fact that S{a) was 
minimized. □ 

Theorem 3 Ifn\t, the FP strategies (a, b) are an e* -equilibrium, where 

^ ~ 2^ t 2n ■ 

Proof. By symmetry, it suffices to show that a is an e* -best-response against b. Applying Lemma [6l Lemma |7] and 
Lemma m we have that 

a = 1711,..^., mi , m2, ■ ^. , m2 , ■ ■ ■ , rnn, ■ -j ,mn, 

t/n t/n t/n 
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where mi, . . . , m„ is an arbitrary labeling of player I's pure strategies. Using ( |20l ). we have that 

1=1 

= 1 + 1-!^ 

t 2n 
_ 1 1 1 

~ 2 I ~ 2n 



This concludes the proof. 

For t superlinear in n, we asymptotically achieve a 

4 Discussion 



□ 

— ^)-Nash equilibrium. 



Daskalakis et al. Q gave a very simple algorithm that achieves an approximation guarantee of ^; subsequent 
algorithms e.g. 13] [12] improved on this, but at the expense of being more complex and centralized, commonly 
solving one or more derived LPs from the game. Our result suggests that further work on the topic might address 
the question of whether ^ is a fundamental limit to the approximation performance obtainable by certain types 
of algorithms that are in some sense simple or decentralized. The question of specifying appropriate classes of 
algorithms is itself challenging, and is also considered in lH in the context of algorithms that provably fail to find 
Nash equilibria without computational complexity theoretic assumptions. 
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